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We make use of a new data resource — merged birth and school 
recordsfor ali children bom in Florida from 1992 to 2002 — to study 
the relationship between birth weight and cognitive development. 
Using singletons as well as twin and sibling fixed effects models, 
wefind that the effects ofearly health on cognitive development are 
essentially Constant through the school career; that these effects are 
similar across a wide range offamily backgrounds; and that they 
are invariant to measures of school quality. We conclude that the 
effects ofearly health on adult outcomes are therefore set very early. 
(7SLI12, J13, J24) 

A large literature documents the effects of neonatal health (commonly proxied 
by birth weight) on a wide range of adult outcomes such as wages, disability, adult 
chronic conditions, and human capital accumulation. A series of studies, conducted 
in a variety of countries including Canada, Chile, China, Norway, and the United 
States, have made use of twin comparisons to show that the heavier twin of the pair 
is more Ukely to have better adult outcomes measured in various ways.^ 
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'Examples of influential previous research include, for the United States: Behrman and Rosenzweig (2004) 
on schooling and wages; Almond. Chay, and Lee (2005) and Conley. StruUy, and Bennett (2003) on neonatal out- 
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While the existing literature makes clear that there appears to be a permanent 
effect of poor neonatal health on socioeconomic and health outcomes, it is important 
for a variety of policy reasons to know how poor neonatal health affects child devel- 
opment, and whether there are public policies that might act to remediate the nega- 
tive relationship between early poor health and later-life outcomes. Knowing this 
relationship can also be useful in helping to understand whether favorable health 
at birth can shield children against adverse shocks, policy or otherwise. However, 
we know very little to date about whether the effects of poor neonatal health on 
cognitive development vary at different ages (say, at kindergarten entry versus third 
grade versus eighth grade) , and no existing study identifìes whether public policies 
such as school quality could help to mitigate the effects of poor neonatal health on 
cognitive development. We also know very little about whether these effects vary 
heterogeneously across different demographic or socioeconomic groups, or whether 
early neonatal health and parental inputs are complements or substitutes. While we 
bave strong evidence from twin comparison studies that poor initial health conveys 
a disadvantage in adulthood, we bave little Information about the potential roles for 
policy interventions in ameliorating this disadvantage during childhood. 

The principal reason for these gaps in the literature involves data availability. The 
datasets that previous researchers bave used to study the effects of poor neonatal 
health on adult outcomes (e.g., Scandinavian registry data, or data matching a moth- 
er's birth certificate to ber children's birth certificates) do not include Information on 
schooling and human capital measures during key developmental years.^ 

Another gap in the adult-outcomes literature is that the subjects of that literature 
were necessarily born in the 1970s and earlier. Given the advances in modem neona- 
tology, it is reasonable to believe that poor neonatal health in the twenty-first century 
may bear little resemblance to poor neonatal health 50 years ago.^ There bave been 
no studies linking neonatal health to either educational or later outcomes in a highly 
developed country context using very recent birth coborts."* 

We make use of a major new data source which can help fili these gaps in the 
literature. We match ali births in Florida from 1992 to 2002 to subsequent schooling 
records for those remaining in the state to attend public school. Florida is an exceUent 



neonatal outcomes, health outcomes in adolescence, educational attainment, and social assistance take-up. For 
China: Rosenzweig and Zhang (2013) on educational attainment, wages. and weight for height. And for Chile: 
Torche and Echevarria (201 1) on fourth-grade mathematics test scores. Bharadwaj, Eberhard, and Neilson (2013), 
in a current working paper, study fourth-grade test scores and grades in school (also in Chile). 

^Exceptions include Bharadwaj, Eberhai'd, and Neilson (2013); Torche and EchevaiTia (201 1); and Rosenzweig 
and Zhang ( 2009) , which examine this relationship in developing countries with less access to advances in medi- 
cai technology that have reduced the lower end of viable birth weights, and in settings that lack the socioeconomic 
and ethnic diversity present in the data from Florida used in this paper. Another alternative data source is the Early 
Childhood Longitudinal Study — Birth Cohort (ECLS-B) of children born in the United States in 2001 which overs- 
amples twins. However, the ECLS-B is too recent to investigate outcomes in late elementary school or adolescence, 
too small to study heterogeneous effects of birth weight, and does not include cognitive outcomes which have high 
stakes for children. 

^ One example of the temperai differences in neonatology is that, whereas 50 years ago the threshold for infant 
viability was around 1,500 grams, today the threshold for viability in developed countries is as low as 500 grams 
or even lower (Lau et al. 2013). Thus, it is independently valuable to study the effects of birth weight using a more 
contemporary set of births than those used in the existing literature. 

'^The potential benefits of using more cuiTent data from a highly developed country become apparent when we 
compare the mean birth weight among twins in our study of children born after 1992 (2,410 grams) to those from 
previous studies of twins from highly developed countries born in the 1930s through the 1970s (which range from 
2,517 to 2,598 grams, depending on the cohort and country) and those from the late 1990s in Chile (2,500 grams). 
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place to study these questions because it is large (its population of around 17 mil- 
lion compares to Norway, Denmark, and Sweden combined) and heterogeneous 
(nearly one-half of mothers are racial or ethnic minorities, and nearly one-quarter of 
mothers were born outside the United States). In addition, Florida has some of the 
strongest education data systems in the United States, and Florida has been testing 
children annually from third through tenth grades for well over a decade. With these 
new data, we foUow over 1.3 million singletons and nearly 15,000 pairs of twins 
from birth through middle school to study the relationship between birth weight and 
cognitive development. 

We fìnd that neonatal health, as measured by birth weight, affects cognitive devel- 
opment in childhood, and that this relationship is remarkably consistent across 
subgroups from a wide range of family socioeconomic status (SES).^ We observe 
this relationship for twin-pair comparisons, sibling-pair comparisons, and single- 
tons, and while the magnitudes of these comparisons differ somewhat, they provide 
reasonable bounds of the likely effects of neonatal health on children's cognitive 
outcomes. 

Comparing across a range of demographic and socioeconomic dimensions allows 
US to address both the stability of results across background and the degree to which 
parental inputs and early health are complements or substitutes. Understanding this 
complementarity is important because it provides a window into the mechanisms 
by which neonatal health and parental resources and behavior contribute to human 
capital development. Whether parental inputs and neonatal health are complements 
or substitutes also has important implications theoretically for understanding the 
distributional effects of investments in infant health, and for guiding the targeting 
of poUcies intended to reduce inequalities by improving early life health (e.g., con- 
sider the role complementarities play in the models of human capital accumulation 
of Cunha et al. 2006; Cunha and Heckman 2007; and Conti and Heckman 2010). 
We fìnd evidence that the effects of birth weight on student outcomes are stronger 
for higher SES families than for lower SES families, suggesting that neonatal health 
and parental inputs are at least to some degree complements. Such complementar- 
ity could he driven by parents with more resources investing more in children with 
batter neonatal health, or could be the result of parents making equal investments 
but those investments by more educated higher SES parents being relatively more or 
less effective at building the human capital of children born with better initial health. 

Importantly, ours is the first study to explore the interaction between schoohng 
factors and the relationship between birth weight and children's cognitive devel- 
opment. Once children reach school age, they spend considerably more time with 
adults who are not their parents than they did before school age. Schooling is the 

^We are certainly not the tìrst paper to conduct heterogeneity analyses of families with twins. Black, Devereux, 
and Salvanes (2007) mention that they investigated sample splits by income and education and find no significant 
differences, but do not report their subgroup-specific findings, making it impossible to address the question of 
whether parental inputs and early health are complements or substitutes. Oreopoulos et al. (2008) report results bro- 
ken down by birth weight group, gestational length, and APGAR scores, but not by different socioeconomic groups. 
Johnson and Schoeni (2011) report results by parental age and the presence of health Insurance, which could 
reflect a vaiiety of factors other than the key questions that we are interested in studying. Bharadwaj, Eberhard, and 
Neilson's (2013) working paper and Torche and Echevarria (201 1) split their analyses by maternal education — but 
the developing Chilean context at the time means that Bharadwaj, Eberhard, and Neilson (2013) only split by high 
school and over versus middle school or lower education. 
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most naturai place where public policy can play a role in promoting cognitive devel- 
opment amongst children in nonfamilial settings. We seek to understand the degree 
to which school quality can help to overcome disadvantages associated with poor 
neonatal health. We fìnd that the relationship between birth weight and cognitive 
outcomes is invariant to a variety of measures of school quality, suggesting that 
while high quality schools bave the potential to improve the outcomes of ali chil- 
dren, they do not reduce the gaps generated by poor neonatal health. 

I. A New Data Source 

A. Description ofthe Dataset and Match Diagnostics 

We make use of matched data for ali children born in Florida between 1992 
and 2002 and educated in a Florida public school between 1996 and 2012. For the 
purposes of this study, Florida's education and health agencies matched children 
along three dimensions: first and last names, exact date of birth, and social secu- 
rity number, with a small degree of fuzziness permitted in the match. Common 
variables excluded from the match were used as checks of match quality. These 
checks confirm that the matches are very clean: in the overall population, the sex 
recorded on birth records disagreed with the sex recorded in school records in about 
one one-thousandth of 1 percent of cases, suggesting that these differences are due 
to typos in the birth or school records almost surely. 

Between 1992 and 2002, 2,047,663 births were recorded by the Florida Bureau 
of Vital Statistics, including 22,625 pairs of twins. Of these children, 1,652,333 
were subsequently observed in Florida public school data maintained by the Florida 
Department of Education's Education Data Warehouse, and 17,639 pairs of twins 
bave both twins present in the Department of Education data. Ali told, 80.7 percent 
of ali children born in Florida, and 79.5 percent of ali twins born in Florida, were 
matched to school records using the match protocols. 

In order to judge the quality of the match, we compare the 80.7 percent rate to 
population statistics from the American Community Surveys and census of popula- 
tion from 2000 to 2009.^ Recali that a child can only be matched in the Florida data 
if he or she (i) is born in Florida; (ii) remains in the state of Florida until school age; 
(ili) attends a Florida public school; and (iv) is successfuUy matched between birth 
and school records using the protocol described above. Reasons (i) through (ili) 
are "naturai" reasons why we might lose children from the match. Our calculations 
from the American Community Survey indicate that, among the kindergarten- aged 
children found in that survey who were born in Florida, 80.9 percent were remain- 
ing in Florida at the time of kindergarten and were attending public school.' We 

^The benefit of non-name unique match identifiers in Florida becomes apparent when we compare our 80.7 per- 
cent match rate to the match rate in North Carolina, the only other state where, to our knowledge, researchers are 
making use of matched birth-school data today. The cleanest North Carolina match rate, which relies on children 
being matched by name, date of birth, and county, is just over 50 percent, and when the match is made less exactly, 
just on name and date of birth, the match rate in North Carolina is between 60 and 65 percent, depending on sub- 
group (Ladd, Muschkin, and Dodge 2012). 

^The 80.9 percent figure is an overstatement of the trae expected match rate because the American Community 
Survey includes only children who are stili living in the United States at the time of kindergarten. Given that some 
children born in Florida leave the country in their first five years — because of emigration, because they were bora 
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Table 1 — Representativeness of the Florida Test Score and Twin Population 
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Notes: The first column presents fractions in total population of children born in Florida between 1992 and 2002. 
The second column presents fractions in total population of children born between 1 992 and 2002 linked to Florida 
school records. The third column presents fractions in total population of children born between 1992 and 2002 for 
whom we observe a third-grade test score. Fourth column presents fractions in total population of twin pairs born 
between 1992 and 2002 for whom we observe third-grade test scores. We restrict columns 3 and 4 only to observa- 
tions that include full Information on birth certificate. 



therefore conclude that the match rate is extremely high, and that nearly ali poten- 
tially matchable children have been matched in our data. 

B. Comparisons ofthe Matched Dataset to the Overall Population 

The set of Florida-born children attending Florida public schools differs funda- 
mentally from the set of ali Florida-born children. It is important to note that twins 
differ from singletons in important ways. Twins have a lower mean gestational age 
and a lower mean birth weight than singletons; they have older and more educated 
mothers, as well as mothers who are more Ukely to be married (Antsaklis, Malamas, 
a nd Sindos 2013). We discuss issues of external validity in the conclusion. 



Table 1 



presents some evidence regarding the overall representativeness of the 
population of children matched to schools and the population of twins, along a num- 
ber of dimensions: maternal race and ethnicity, maternal education, maternal age, 
maternal immigrant status, and parental maritai status. There are four columns in the 
table: the first column reflects the total population of children born in Florida; the 
second reflects the population of children matched to Florida public school records; 
the third represents the set of children with a third-grade test score; and the fourth 
reflects the set of twins born in Florida who have a third-grade test score. (Children 
in these last two columns also must fulfill the other data requirements, such as non- 
missing core control variables, for inclusion in the study.) The comparison between 
the first and second columns makes clear the costs associated with carrying out 
this type of analysis in the United States, where children are lost for matching if 



to nonimmigrant visitors to the country, or because they were born to undocumented immigrants who returned to 
their home countries — the true expected match is somewhat below 80.9 percent. 
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they cross state lines between birth and school or if they attend private school. We 
observe that the set of matched children are more likely to be Black (24.8 percent of 
matched children versus 22.6 percent of ali children) and less likely to bave married 
mothers (62.2 percent versus 64.8 percent of ali children). The mothers of matched 
children are more likely to be less educated (17.1 percent college graduate versus 
20. 1 percent overall, and 22.5 percent high-school dropout versus 20.9 percent over- 
all) and are moderately younger (23.6 percent aged 21 or below versus 22.0 percent 
overall, and 9.3 percent aged 36 or above versus 9.8 percent overall). 

The comparison between the second and third columns of Table 1 shows the dif- 
ference in composition of the population of test takers in elementary school versus 
those matched to school records more generally. Third-grade test takers are stili lower 
in terms of socioeconomic status than are ali children appearing in public school 
data. The fact that matched children are of somewhat lower socioeconomic status, 
and that those with third-grade scores are somewhat lower again, is unsurprising, 
given the well documented relationship between family income (or parental educa- 
tion) and private school attendance.^ However, our fìndings of estimated relation- 
ships between birth weight and test scores which are remarkably similar across very 
dissimilar groups reduce some of the potential concems regarding extemal validity. 

The comparison between the third and fourth columns of Table 1 demonstrates 
the consequences of making use of twin comparisons. Mothers of twins are quite 
different from the overall population: mothers of twins are substantially less likely 
to be Hispanic or foreign-born and substantially more likely to be married than are 
mothers of singletons. In addition, they are considerably better educated (23.1 per- 
cent college graduate versus 15.8 percent in the overall population of test takers, 
and 15.5 percent high school dropout versus 23.3 percent of ali test takers) and 
considerably older (13.6 percent aged 36 or above versus 9.2 percent in the overall 
population of test takers, and 14.4 percent aged 21 or below versus 24.2 percent 
in the overall population of test takers.).' Therefore, the decision to focus on twin 
comparisons to promote increased internai validity brings with it some cost in terms 
of external validity. In this paper, we therefore present evidence on the relationship 
between birth weight and cognitive development both in the case of twin compari- 
sons — where internai validity is greatest — as well as the case of singletons — where 
external validity is greatest. Our general patterns of results are quite similar across 
both cases. 

C. Birth Weight Distrìbutions 

The variation which we use to identify the effect of poor neonatal health on cog- 
nitive skills Comes from the fact that nearly ali twin pairs differ in the birth weights 
of the two newborns, and sometimes the difference is substantial. In Florida, the 



*These relationships are observed in the census data: in the 2000 census, for instance, 6 percent of families 
earning $25,000 or less per year sent their children to private school, as compared with 7 percent for those earning 
$25,000-$50,000 per year, 13 percent for those earning S50,000-$75,000 per year, and 19 percent for those earning 
over $75,000 per year. 

'Twins are also more likely to be the consequence of in vitro fertilization (IVF) or other forms of assisted repro- 
ductive technologies (ART) . Later in this paper we investigate the differential effects of birth weight for twins likely 
conceived using IVF/ ART versus those less likely to bave been conceived using IVF/ ART. 
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Figure 1. Discordance in Birth Weight among Twins Born in Florida 

BETWEEN 1992 AND 2002 



Notes: Figure 1 plots kernel density distributions of within-twin-pair difference in birth weight 
for ali twin births in Florida (solid line) between 1992 and 2002 and twin births who were 
bom in Florida and were successfuUy matched to Florida public school records (dashed line). 
Distributions are censored at 2,000 grams for the sake of clarity. 



average discordance in twins' birth weight is 284 grams (g r) (0.63 p ounds), or 11.8 



percent of the average twin's birth weight of 2,410 gr. Figure 1 shows that the 
distribution of discordance for ali twins is virtually identical to the distribution of 
discordance for twins matched to test scores. Of twin pairs, 51.4 percent bave birth 
weight discordance over 200 gr, and 16.8 percent bave birth weight discordance 
over 500 gr. Forty-fìve percent of twin pairs bave birth weight discordance greater 
than 10 percent of the larger twin's birth weight, 26.6 percent bave discordance 
greater than 15 percent of the larger twin's birth weight, and 14.7 percent bave dis- 
c ordance gr eater than 20 percent of the larger twin's birth weight." 



Figure 2 makes clear that twins bave a dramatically different distribution of birth 



weight than do singletons. The mean twin birth weight during our time period 
(2,410 gr) is 27.9 percent smaller than the mean singleton birth weight of 3,342 gr. 
For both twins and singletons the birth weight distribution of children observed 
in the test score data is identical to the distribution of ali children born in Florida. 
53.2 percent of twins bave birth weights below 2,500 gr (considered clinically low 
birth weight), as compared with 5.9 percent of singletons, while 7.1 percent of twins 
bave birth weights below 1 ,500 gr (considered clinically very low birth weight) , as 
compared with 0.9 percent of singletons. 



'°Blickstein and Kalish (2003) provide an overview of the literature on growth restriction explanations for birth 
weight discordance. In addition, there are some medicai reasons which might lead to birth weight discordance; for 
example, Kent et al. (201 1) find that noncentral placental cord insertion leads to birth weight discordance in some 
pregnancies. Breathnach and Malone (2012) survey the literature on fetal growth disorders in twin gestations. 

' ' There exists medicai evidence that large birth weight discordances lead to increased chances of severe dis- 
ability. For instance, Luu and Vohr ( 2009) find that the likelihood of cerebral palsy in a twin is tour times greater 
when birth weight discordance is over 30 percent than when it is less than 30 percent. 



3928 



THE AMERICAN ECONOMIC REVIEW 



DECEMBER 2014 




0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000 



Birth weight (grams) 

Figure 2. Difference in Birth Weight Distributions 
among singletons and twins born in florida between 1992 and 2002 

Notes: Figure 2 plots kernel density distributions of infant birth weight for ali singletons { solid 
gray line) and twins { solid black line) born in Florida between 1 992 and 2002 as well as infant 
birth weight distribution of singletons (dashed black line) and twins (dashed gray line) that 
were successfuUy matched to Florida public school records. 



II. Empirical Framework 

Our empirical framework largely follows wiiat iias become standard in tiie lit- 
erature. For our twins' analysis, we estimate twin fìxed effect models in wiiicii the 
regressor of interest is the naturai logarithm of birth weight.^" Following Almond, 
Chay, and Lee (2005) — henceforth, ACL — and Black, Devereux, and Salvanes 
(2007)— henceforth, BDS— let 

(1) y..^ = a + P In {bw)ijk + x.-^,7 + (pjk + Sy^, 

where /' indexes individuals, j indexes mothers, k indexes births, y..^, denotes the 
outcome of child /, born to mother j in twin-pair k, x is a. vector of child-specific 
determinants of the outcome (in the case of twins, child gender and within-twin-pair 
birth order) , (f) denotes unobservable determinants of the outcome which are specific 
to the mother and birth, and e is an error term. We also estimate singleton-specifìc 
analyses in which we control for a wide range of maternal characteristics, as well 
as (in some specifìcations) gestational length, to make as apples-to-apples compari- 
sons with the twin specifìcations as possible. Our results are invariant to whether or 
not we condition on geography. 

Our outcome, denoted y, is a test score — the criterion-referenced Florida 
Comprehensive Assessment Test (FCAT) — which is standardized within grade 
and year to bave mean zero and standard deviation one in the entire population of 



'^We follow an analogous approach regarding sibling comparisons. 
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children in Florida. For ease of presentation, we average standardized reading and 
mathematics FCAT scores for our dependent variable, but our results are qualita- 
tively similar for reading and mathematics, and the test-specifìc results are avail- 
able on request.^^ The regressor of interest, ìn{bw), is the naturai logarithm of birth 
weight in grams. In Section VI we present results from specifìcations other than the 
linear-in-log model, but the linear-in-log model appears to fìt the data well. 

Ordinary least squares (OLS) estimation of (1) would produce biased estimates 
of (3 if (j)jk were correlated with In {bw)ijk. In other words, if there were unobservable 
determinants of cognitive ability correlated with birth weight. To address the poten- 
tial bias due to correlation between (j)jk and ln{bw)ijk, we estimate a twin fìxed effect 
model. Twins necessarily share the same (j)jk. Essentially, a twin fixed effect model 
differences out any mother- or birth-specifìc confounder and identifìes (3 based on 
between-twin variation in test scores and birth weight. Logically, birth weight can 
vary due either to variation in gestation length or to variation in fetal growth rates. 
By focusing on twins, necessarily we hold gestation length Constant. Our estimates 
are identifìed, therefore, by variation in fetal growth rates. We also present evidence 
from singleton births that, while they lack the internai validity of the twin compari- 
sons, allow us to show the relationships between gestation length, birth weight, and 
cognitive skills in the overall population of children. 

One potential internai validity concern is that we can only make use of test score 
data for a twin pair if both members of the pair bave test scores. If one twin is pres- 
ent in the test score data but not the other, and the reasons for differential inclu- 
sion in the data are correlated with neonatal health, the absence of one twin's test 
score could present a source of bias. A related concern relates to the fact that we 
only observe education records for individuals born in Florida who remained in 
Florida, attended Florida public schools and took the FCAT. Various tests reported 
in detail in Figlio et al. (2013) suggest that in practice the selection bias resulting 
from either of these sources is likely to be minimal. For example, the likelihood 
of leaving the sample between third grade and fourth or fifth grade is uncorrelated 
with whether the twin is the heavier or lighter of the pair, and only slightly more 
likely for the lighter twin in grades six through eight. The relative number of miss- 
ing twins is too small to make a meaningful difference in the estimates even in these 
later grades. Furthermore, estimates in which we impute very low or very high test 
scores for missing twins yield almost identical results as those reported in the main 
specifìcations. 



III. Preliminary Results: Heavier versus Lighter Twins 



To fix ideas before presenting the main regression results. Figure 3 shows the aver- 
age within-twin-pair difference in average math and reading test score between the 
higher birth-weight twin and the lower birth-weight twin for grades three through 
eight. Within twin pairs, on average the heavier twin scores about 5 percent of a 



We standardize FCAT scores for ease of interpretation. Our results are not substantively changed if instead we 
measure the FCAT in its unstandardized developmental scale score format. 

'"'in the main twins regression specification, 99.5 percent of observations have both math and reading scores, 
0.2 percent have only math, and 0.3 percent have only reading. 

'^The same patterns for math and reading separately are in Figures Al and A2 in the online Appendix. 
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Figure 3. Average Within-Twin-Pair Difference 
IN Test Scores between Heavier and Lighter Twins 



Notes: Figure 3 plots difference between the mean test score of heavier and lighter twin from 
each pair in each grade and the respective 95 percent confìdence interval of this difference. 
Mean test score is constructed as an average of scores in mathematics and reading for each 
individuai in each grade where we observe both twins. If score in mathematics is not avail- 
able then only reading is used and vice versa. In each grade we create an average of scores for 
heavier and lighter twins and then calculate the difference between the two. 



Standard deviation higher than the lighter twin. This difference in test scores is sta- 
tistically distinguishable from zero, and is stable from third through eighth grades, 
covering ages from approximately 9 to 14.'^ The results imply that neonatal health, 
as proxied by birth weight, has effects on cognitive skills by age nine. Furthermore, 
t his effect do es not seem to either dissipate or widen through middle school. 



Figure 4 breaks down this mean difference by quartile of twin birth weight 
discordance;^^ the bottom and top quartiles average 2.5 and 23.9 percent discor- 
dance, respectively. Two facts are apparent from this figure: first, the relationship 
between relative birth weight and relative test scores within twin pairs is roughly fìat 
as children age. Second, the higher degree of birth weight discordance, the larger 
test score gap between the larger and the smaller twin. Figure A3 in the online 
Appendix shows that the positive relationship between birth weight discordance and 
test score differences is present and clear when we break down the twin pairs or sib- 
ling pairs into fine discordance bins (one for each percentage point, and a final bin 
for pairs with greater than 20 percent discordance), with the slope of the relationship 
modestly fìatter for sibling pairs than it is for twin pairs. These fìndings foreshadow 
the main fìndings of this paper. 



'^For ali analyses separated by grade, we assign students to the grade they would bave been in had they pro- 
gressed one grade per year from the first time we observe them with an FCAT score in third grade. We use this 
"imputed grade" rather than the student's actual grade because grade retention may be affected by birth weight and 
because we are interested in following children longitudinally. Ali results ai'e extremely similar if we focus on actual 
grade rather than this imputed grade. 

'^We limit this analysis to same-sex twins to ensure that the differences in discordance are not due to well- 
documented differences in birth weight between boys and girls. 
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Difference in means of combined test scores 
by discordance quartiles: Same-sex twins 
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Figure 4. Means of Scores by Discordance Quartiles 

Notes: Figure 4 plots difference between the mean test score of tieavier and lighter twin from 
each pair in each grade for four quartiles of discordance in birtli weiglit. Mean test score is con- 
structed as an average of scores in madiematics and reading for each individuai in each grade 
where we observe both twins. If score in mathematics is not available then only reading is 
used and vice versa. In each grade we create an average of scores for heavier and lighter twins 
and then calculate the difference between the two. Discordance is calculated as the difference 
between heavier and lighter twin birth weight over the weight of the heavier twin. Mean dis- 
cordance for each group in parentheses. 



IV. Main Results 

A. Pooled Results for Full Sample 

We now turn to our main regression results. The basic regression model is an OLS 
estimate which includes twin-pair fìxed effects, a gender dummy, and a dummy for 
within-twin-pair birth order. The dependent variable is the standardized FCAT score 
averaged between reading and math/^ and the regressor of interest is the naturai 
logarithm of birth weight in grams. We report some results based on separate regres- 
sions for each grade from third to eighth, and other results that pool test scores 
across ali six grades. In the pooled regressions, standard errors are clustered at the 
individuai level (for singletons) and twin-pair level (for twins) to account for the 
fact that each individuai has up to six observations, one for each grade in which he 
or she was tested.'^ 

The non parametric plots of the relationship between test scores and birth weight 



reported in Figure 5 present evidence supportive of the log birth weight specifìca- 
tion which we employ, as there appears to be a concave relationship between birth 
weight and test scores. The figure shows two series, each derived from a test score 



'^See Figlio et al. (2013) for sepai'ate findings for reading and mathematics. 

"An earlier version of this paper (Figlio et al. 2013) clusters standard errors for twins at the individuai level. 
The level of clustering (individuai versus twin or sibling pair) has no substantive effect on our findings. In grade- 
by-grade singleton models with one observation per child, we estimate robust standai'd en'ors. 
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Figure 5. Nonparametric Relationship between Birth Weight and Test Scores 

Notes: Figure 5 plots coefficients from OLS (black solid line) and twin fixed effects (gray solid 
line) models where the dependent variable ( y-axis) is the mean of pooled grades 3-8 of com- 
bined mathematics and reading test score.s for each individuai and the independent variables 
(x-axis) are indicators for 36 weight bins corresponding to each individuai birth weight. No 
additional controls are included in the models. 



regression that pools grades 3-8 and both math and reading scores. Each series plots 
the coefficients from a set of 36 dummy variables corresponding to 100 gram-wide 
birth weight bins. The bins range from a low of 501-600 to a high of 4,001-4, 100 gr. 
In both regressions, the excluded group is below 501. As was observed in similar 
sets of plots by ACL and BDS, the shape of the relationship between test scores and 
birth weight is similar whether or not we condition on twin-pair fixed effe cts. 



The main result, an estimated coeffìcient of 0.443 presented in column 2 ol Table 2 
implies that a 10 percent increase in birth weight is associated with just under one- 
twentieth of one standard deviation increase in test scores in grades 3-8.^° The coef- 
fìcient is precisely estimated, with a f-statistic of over 10. The fixed effects result is 
modestly larger than, but dose to, the equivalent OLS coeffìcient of 0.285 reported 
in the first column of Table 2.^^ 

To put the magnitude of these coefficients info perspective, BDS estimate that 
the effect of log birth weight on log earnings is 0.12. Assuming the log wage return 
to cognitive skills is 0.2 as estimated by Neal and Johnson (1996), our estimates 
imply that increases in cognitive skills present in grades 3-8 explain approximately 
three-quarters of the effect of birth weight on wages found by BDS. Similarly, 
Royer (2009) estimates that a 1,000 gr increase in birth weight is associated with 
an extra 0.16 years of schooling. Using the online analysis tool of the High School 

™We also find that birth weight is associated with a modest but strongly statistically significant increase in a 
child's grade in school at any given age. In the twin fixed effects model, a 10 percent increase in birth weight is 
associated with just under one-one hundredth higher grade for any given age; the estimated coefficient on log birth 
weight when the dependent variable is grade for age is 0.083 with a standard eiTor of 0.019. 

^'We concentrate on birth weight because there is greater variation in birth weight than in other measures of 
neonatal health. That said, we find positive, statistically significant relationships between APGAR scores and test 
scores. For instance, in a pooled twin fixed effects model, a one-unit increase in one-minute APGAR scores is asso- 
ciated with 0.8 percent of a standard deviation higher average reading and math scores. 
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TaBLE 2 ESTIMATED EfFECTS OF BiRTH WeIGHT ON COGNITIVE DeVELOPMENT 





Pooled 

OLS FE 

(1) (2) 






Imputed g 


rade 






(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


Panel A. Twins (average of mathematics and reading): Estimates on 


In {birth weight) 








AH twins 


0.285*** 0.443*** 


0.444*** 


0.526*** 


0.431*** 


0.428*** 


0.390*** 


0.376*** 




(0.022) (0.039) 


(0.043) 


(0.045) 


(0.047) 


(0.053) 


(0.057) 


(0.061) 




[126,636] 


128,434] 


126,508] 


122,970] 


119,340] 


116,186] 


113,198] 


Same-sex twins 


0.300*** 0.452*** 


0.463*** 


0.532*** 


0.411*** 


0.469*** 


0.402*** 


0.368*** 




(0.027) (0.043) 


(0.050) 


(0.053) 


(0.053) 


(0.059) 


(0.062) 


(0.066) 


Opposite-sex twins 


0.259*** 0.421*** 


0.399*** 


0.513*** 


0.475*** 


0.330*** 


0.360*** 


0.390*** 




(0.038) (0.082) 


(0.086) 


(0.088) 


(0.097) 


(0.112) 


(0.122) 


(0.136) 


Panel B. Singletons {average of mathematics and rea 


ding): Estimates on In {birth weight) and gestation 






ln(birtti weight) 


0.285*** 


0.305*** 


0.289*** 


0.292*** 


0.281*** 




U.ZOl '^^^ 




(0.004) 


(0.004) 


(0.004) 


(0.004) 


(0.005) 


(0.005) 


(0.005) 




[5,752,665| 


11,254,821] 


11,181,590] 


11,040,814] 


1888,895] 


1756,478] 


1630,067] 


ln(birth weight) 


0.332*** 


0.345*** 


0.336*** 


0.337*** 


0.328*** 


0.313*** 


0.316*** 


gestation weeks 


(0.005) — 


(0.005) 


(0.005) 


(0.006) 


(0.006) 


(0.007) 


(0.007) 


ln(birth weight) 


0.421*** 


0.430*** 


0.424*** 


0.428*** 


0.421*** 


0 399*** 


0.406*** 


gestation weeks 


(0.007) _ 


(0.008) 


(0.008) 


(0.009) 


(0.009) 


(0.010) 


(0.011) 


[overlapping] 
















Gestation weeks 


0.013*** 


0.015*** 


0.013*** 


0.013*** 


0.012*** 


0.011*** 


0.010*** 




(0.000) _ 


(0.000) 


(0.000) 


(0.000) 


(0.000) 


(0.000) 


(0.001) 


Panel C. Siblings {average of mathematics and reading): Estimates on In {birth weight) and gestation 






ln(birth weight) 


0.277*** 0.238*** 


0.263*** 


0.254*** 0.241*** 0.219*** 


0.179*** 


0.178*** 


(0.009) (0.011) 


(0.012) 


(0.013) 


(0.015) 


(0.017) 


(0.021) 


(0.026) 




[1,110,206] 


1294,782] 


1267,751] 


1212,294] 


1156,910] 


1109,883] 


168,586] 


In (birth weight) 


0.403*** 0.317*** 


0.345*** 


0.335*** 


0.315*** 


0.344*** 


0.227*** 


0.200*** 


gestation weeks 


(0.018) (0.022) 


(0.024) 


(0.025) 


(0.028) 


(0.033) 


(0.039) 


(0.050) 


[overlapping] 
















Gestation weeks 


0.012*** 0.008*** 


0.009*** 


0.009*** 


0.008*** 


0.005*** 


0.006*** 


0.005** 




(0.001) (0.001) 


(0.001) 


(0.001) 


(0.001) 


(0.001) 


(0.002) 


(0.002) 



Notes: Columns 1 and 2 present pooled grade 3-8 results for OLS, twin, and sibling fixed effects models. Columns 3 
to 8 present OLS, twin, and sibling fixed effects estimates separately for each of the six grades. Bach coefficient 
Comes from a separate regression. Sample sizes in square brackets reflect number of observations in each regres- 
sion; only twin pairs where both twins are observed with test scores in each grade are included; only siblings where 
at least two siblings are observed with test scores in each grade are included. Ali singletons are included except 
for the second to last estimate for singletons where only singletons with birth weight in range 847 to 3,600 gr are 
included. Siblings couid be identifìed only in about one-half of the population. We include ali siblings who bave test 
scores in given grade. In column 7 we focus only on siblings where the birth weight ranges from 847 to 3,600 gr. 
This restriction provides overlapping distribution of birth weight among twins and singletons. The dependent vari- 
ables are averaged test scores in mathematics and reading. If the test score in mathematics is noi available then 
reading is included and vice versa. The main variable of interest is naturai logarithm of birth weight. The remaining 
independent variables in twin fixed effects models include intani gender and within-twin-pair birth order. OLS esti- 
mates further control for infant birth month and year, maritai and immigration status, race and ethnicity, indicators 
for matemal age (each for one year) , education (high-school dropout, high-school graduate, college graduate) , and 
number of births (each for one birth) . Sibling fixed effects estimates further control for birth order within a family. 
Naturally lime invariant characteristics of the mothers are dropped in sibling fixed effects specifications. Standard 
errors in ali twin estimates are clustered at twin-pair level. Standard eiTors in singleton estimates are clustered at 
individuai level in pooled regressions (column 1) while heteroskedasticity robust standard errors are calculated in 
columns 3 to 8 where there is just one observation per individuai. Standard errors in ali sibling estimates are clus- 
tered at mother level. 

*** Significant at the 1 percent level. 
** Significant at the 5 percent level. 
* Significant at the 10 percent level. 
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and Beyond dataset, which foUows longitudinally a cohort in the middle of Royer's 
sample, we estimate that a one standard deviation increase in tenth-grade test scores 
is associated with 0.84 additional years of completed education."^ Combining this 
with our fìnding that a 1 ,000 gr increase in birth weight is associated with a 0. 1 87 
standard deviation increase in test scores, our resuits imply a 1,000 gr increase in 
birth weight is associated with 0.156 additional years of schooling, almost exactly 
in line with Royer's findings. 

Our estimate of the effect of neonatal health on cognitive development is reason- 
ably large in these te rms. but i t is worth comparing to other important correlates of 



student achievement. Figure 6 shows that the difference in test scores resulting from 
differences in birth weight is small compared with differences in achievement asso- 
ciated with mother's education. Bach of the differences between heavier and lighter 
twins shown in the figure is statistically significant. However, it is clear that in terms 
of math and reading achievement, it is better to be the lighter twin of a college edu- 
cated mother than the heavier twin of a high-school dropout mother. Taken together, 
these findings suggest that while nurture can go a long way toward remediating a 
child's initial disadvantage, there are stili biological factors at play that make it dif- 
ficult to fully remediate this disadvantage.^^ 



B. Resuits by Gradefor Full Sample 

A key question of interest is how the cognitive effects of in utero conditions and 
neonatal health develop. We bave already shown that the effects of birth weight on 
cognitive achievement in grades three through eight are similar to those observed 
with respect to adult earnings. We next explore how the impact on test scores changes 
during these important years for human capital development. Does the effect of birth 
weight grow larger as children develop, or does the effect appear by age nine and 
remain Constant through the upper elementary and middle grades? 

The resuits are presented in columns 3-8 in Table 2. The table shows the esti- 
mated effect of log birth weight from twin fixed effects models that are estimated 
separately for test scores from each grade, 3-8. The table shows that the twin fixed 
estimate of the effect of birth weight on cognitive achievement is already 0.444 by 
the third grade, and that the grade specific estimated effect remains fairly stable 
from third through eighth grade, ranging from 0.376 to 0.526. The F-test that the 
grade level estimated effects are identical is rejected at a moderate level of statisti- 
cai significance {p = 0.069). However, there is no evidence that this effect foUows 
a substantial systematic pattern as children progress through school; in a regression 
model in which we interact the log of birth weight linearly with grade in school, the 
coefficient estimate on the interaction term is one-two thousandth the magnitude of 
the coefficient on log birth weight. These resuits suggest that whatever effect early 



We weighted the individuai in the High School and Beyond data by their base year replicate weights. For the 
sake of this analysis, we define high-school dropouts as having 10 years of education. GED recipients as having 11, 
high school graduates as having 12, certificate recipients as having 13, associates recipients as having 14, bachelors 
recipients as having 16, masters or professional degree recipients as having 18, and doctorate recipients as having 
19 years of education. 

^^We do not mean to suggest that our resuits answer the age-old nature/nurture question. Rather, they are con- 
sistent with the growing literature on epigenetics which shows that environmental and biological factors interact 
(Miller et al. 2009 or Lam et al. 2012). 
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Figure 6. Average within-Twin-Pair Difference in Test Scores between Heavier and 
Lighter Twin by Maternal Education Categories 



Notes: Figure 6 plots means of combined mathematics and reading test scores for lighter and 
heavier twins from each pair stratified by matemal education. Black lines correspond to aver- 
ages for lighter while gray lines correspond to heavier twins. Solid lines present means for 
high-school dropout mothers, dashed lines present means for children of mothers with high- 
school diploma or some college while dotted lines present means for college graduates. 



health at birth has on cognitive development occurs largely by age nine, and remains 
fairly Constant throughout the preadolescent and adolescent years. 

In a previous version of this paper (Figlio et al. 2013), we look further back, to the 
beginning of formai schooling.^"* In various years between 1998 and 2008, Florida 
performed universal kindergarten readiness screening. From 1998 through 2001 ali 
kindergarten entrants were screened with the School Readiness Checklist (SRC), 
a list of 17 expectations for kindergarten readiness. Subsequently, kindergarten 
entrants were screened with the Dynamic Indicators of Basic Early Literacy Skills 
(DIBELS), and beginning in 2006 the results of this screening were coUected and 
recorded by the Florida Department of Education."^ DIBELS rates children's letter 
sound recognition and letter naming skills and categorizes children as above aver- 
age, low risk, moderate risk, or high risk. In our data, 82.1 percent of children were 
deemed ready according to the earlier SRC screen, and a very similar 83.8 percent 
of children were deemed either above average or low risk according to the DIBELS. 
Making use of twin comparisons in a linear probability model,^'' we observe that 



There is some reason to believe that the effects of early health defìcits may differ between the start of kinder- 
garten and the end of third grade. At ages 6-8, as children enter full-time schooling, they spend on average 30 per- 
cent less time being actively cared for by their parents than they did when they were 3-5 and 43 percent less time 
than when they were 0-2 (Folbre et al. 2005). The shift in time spent with parents to time spent with other adults 
(such as teachers) and peers {Sacerdote 2001) suggests it may be important to gauge how the effect of neonatal 
health on cognitive development changes in the early schooling years. 

^''For more details about the structure and interpretation of DIBELS, see, e.g., Hoffman, Jenkins, and Dunlap 
(2009). 

^^The pattern of results and statistical significance is extremely similar when we instead estimate conditional 
logit models. 



3936 



THE AMERICAN ECONOMIC REVIEW 



DECEMBER 2014 



a 10 percent increase in birth weight is associateci with a 0.67 percentage point 
increase in being deemed ready for kindergarten according to the school readiness 
checklist, and a 1.15 percentage point increase in kindergarten readiness according 
to the DIBELS. When we pool the two sets of cohorts, these fìgures average to a 
0.86 percentage point increase."' Ali estimates are statistically distinct from zero at 
conventional levels. These results suggest that the effect of neonatal health on cogni- 
tive development is present by age live. 

C. Role of Gene tic Dijferences between Twins 

For some policy questions, it might be important to isolate the impact of factors 
that change intrauterine growth while holding genetics Constant. A potential weak- 
ness of our data is that they do not include the zygosity of the twins. However, we 
can look at same-sex versus different-sex twins: if genetic differences were driving 
a significant portion of the relationship between birth weight and test scores, and 
birth weight was positively correlated with positive determinants of later cognitive 
skills, we would expect to see a stronger correlation between birth weight and test 
scores among opposite-sex twin pairs. As can be seen in the second and third rows of 
Table 2, the estimated effect of birth weight is extremely similar for same-sex twins 
(0.452) and opposite-sex twins (0.421), suggesting that the estimated relationship 
is within the same general range regardless of zygosity. Our finding is consistent 
with results reported in BDS, who fìnd no significant difference in the effect of birth 
weight on adult earnings between same-sex and opposite-sex twins, nor do they fìnd 
a significant difference in the estimated effect of birth weight on earnings for mono- 
zygotic twins and dizygotic same-sex twins in their sample with available zygosity 
Information. 

D. Parallel Results for Singletons 

As mentioned above, our emphasis (and the prevailing emphasis in the litera- 
ture) on using twin comparisons to improve internai validity comes at a cost in 
terms of extemal validity. Twins bave older and more educated mothers, and weigh 
considerably less on average at birth than singletons. In addition, there could be 
some unmeasured factor (e.g., a factor associated with in utero fetal competition) 
associated with both birth weight and cognitive skills that could compromise our 
ability to draw causai inferences about the effects of neonatal health on later test 
scores in twin comparison studies. For these reasons, it is valuable to gauge the 
degree to which the estimated relationships for singletons compare with the fìndings 
for twins. In our singletons regressions, we further control for a set of background 
characteristics: gender, month and year of birth dummies, maritai and immigrant 
status, race and ethnicity, three dummies for maternal education, and dummies for 
age and number of prior births. 

The fourth row of Table 2 presents OLS fìndings for singletons. Two features are 
apparenti first, the relationship between log birth weight and test scores is roughly 

^'in Figlio et al. (2013) we go into detail about the metrics one can employ to directly compare the dichotomous 
kindergarten readiness assessments to later continuous test scores. 
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Constant as children grow older, just as it was in the case of twins. Furthermore, 
the OLS coeffìcient for singletons in the pooled model (0.285) is identical to the 
comparable OLS coeffìcient for twins (0.285). This similarity provides the first 
piece of evidence about the potential external validity of our twin results. 

Recali that our twin fixed effects relationship is larger than our twin OLS relation- 
ship. One possible reason for this difference is that the twin fixed effects relationship 
conditions effectively on gestational length. In the fifth row of Table 2 we condition 
on gestational length for singletons, and find an OLS coefficient that is somewhat 
larger than was the case without controlling for gestational length. A comparison 
of the results may indicate that the rate of intrauterine growth matters for cognitive 
development, above and beyond the effect of measured birth weight. 

Singletons include some infants whose birth weight is high enough that it likely 
indicates an underlying poor maternal health condition such as gestational diabe- 
tes, whereas it is rare for a twin to bave a birth weight in this high range. When we 
further limit the singletons analysis to the range between 847 and 3,600 gr, the first 
and ninety-ninth percentiles of the twin birth weight distribution, we estimate the 
OLS relationship between log birth weight and pooled test scores, conditional on 
gestational length, to be 0.421, extremely similar to the twin fixed effects finding of 
0.443. In sum, the closer we get to shaping the singletons OLS analysis to be paral- 
lel to the twin fixed effects analysis, the closer the two results become. In addition, 
as can be seen in row 7 of Table 2, when we look just at the relationship between 
weeks of gestation and standardized test scores, we observe that each week of 
gestation is associated with just over 1 percent of a standard deviation increase in 
test scores. 

In a set of counties representing 56 percent of the population of the state of Florida, 
we are able to also control for family fixed effects in the singletons analysis. The 
results of this sibling analysis are presented in rows 8-10 of Table 2. The estimated 
effects of birth weight on test scores in the sibling comparisons tend to be around 
three-quarters of the magnitude of the twin fixed effects estimates, but remain in the 
same ballpark. The differences in magnitudes are due to the differences between 
the sibling comparisons and the twin comparisons, and not the fact that we observe 
siblings in a subset of the state, as can be seen when we consider the OLS coef- 
ficients in the sibling subpopulation to the overall singletons population. The OLS 
coefficient on log birth weight is 0.277 for siblings and 0.285 for singletons, and 
the coefficient on log birth weight conditional on gestation in the overlapping sam- 
ple is 0.403 versus 0.421 for ali singletons. We suspect that the modest differences 
between the twin fixed effects models and sibling fixed effects models are due to 
factors such as differential parental investments in siblings (Bharadwaj, Eberhard, 
and Neilson 2013; Hsin 2012) or direct spillo vers between siblings (as we find in 
Black et al. 2014). 

Since we find that the estimated coefficients on log birth weight are so similar 
when we condition on twin fixed effects or when we use the population of singletons 
with birth weights in the observed range of twins and condition on gestation length, 
a naturai next step is to observe w hether the distribution of these estimated effects 



are the same as well. In Figure 7, we present the estimated marginai effects of log 
birth weight on different parts of the cumulative distribution function (CDF) of 
the test score distribution, broken down by half-standard-deviation increments, for 
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twins, singletons, and siblings. This figure demonstrates that additional birth weight 
is especially strongly associated with moving children from the range of scores just 
below average to the range of scores just above average, and is less strongly related 
to test scores far away from the average score. 

E. Heterogeneity ofResults by Gender, Maternal Health, and Background 

The diversity of demographics in Florida combined with the size of the dataset 
allow US to investigate heterogeneity in the effects of birth weight in ways that bave 
not been possible in other related work to this point. It is inherently interesting to 
learn whether the long-term effects of in utero conditions on cognitive development 
vary across demographic and socioeconomic groups. Moreover, examining this 
heterogeneity may shed light on the mechanisms by which neonatal health affects 
cognitive skills. If the factors of disadvantage (e.g., household income, wealth, and 
parental education) are substitutes with neonatal health in the production of cog- 
nitive skills one should expect to see larger effects of birth weight on test scores 
for more disadvantaged groups. If they are complements with neonatal health, one 
s hould exp ect to see larger effects for more advantaged groups. 



Table 3 



presents a wide range of heterogeneity fìndings. For the sake of clarity, 
in the table we report the results in which we pool test scores across ali grades; 
in online Appendix Table Al we report grade-by-grade results for ali subgroups 
of the twi ns analysis. Furthermore due to space constraints, in the print Appendix 



Table Al we report the group mean test score and birth weight for twins and single- 
tons, respectively, in each subgroup. The first column in Table 3 reports the mean 
and standard error of the estimated effect of birth weight on test scores in a twin 
fìxed effects model. The second through fourth columns report the parallel fìnd- 
ings for singletons: the estimated coeffìcient on log birth weight (column 2), log 
birth weight conditional on gestation length (column 3),^* and gestation length (col- 
umn 4) , while the fìfth through seventh columns perform the same analysis when we 
condition on sibling fìxed effects. 

As can be seen in panel A of Table 3, the results are very similar for boys and 
girls.^^ While boys are heavier than girls (4.4 percent for twins, 3.8 percent for 
singletons) , the pooled twin fìxed effects estimates for boys and girls are virtually 
identical (0.454 and 0.449, respectively). The same is true when we make compari- 
sons in either the singleton population or in the case of sibling fìxed effects. 

Panel B of Table 3 stratifìes births based on whether the mother has a medicai 
history that potentially posed a problem for the pregnancy or delivery.^" Around 
one-quarter of mothers bave at least one of these risk factors. We observe that the 
pooled fìxed effects estimates are very similar (0.422 for mothers with medicai 



^^In the singleton and sibling specifications conditioning on gestational length, we also limit the range of birth 
weights to the approximate twins birth weight range, between 847 and 3,600 gr. 

^'Rosenzweig and Zhang (2009) suggest that there could be important differences by gender in their study's 
setting. However, these differences may reflect cultural factors specific to the rural Chinese context. 

^°The specific medicai history factors recorded on the birth record are: anemia; cardiac disease; acute or chronic 
lung disease; diabetes; genital heipes; hydramnios/oligohydramnios; heinoglobinopathy; chronic hypertension; 
pregnancy-associated hypertension; eclampsia; incompetent cervix; previous intani over 4,000g; previous preterm 
or small for gestational age infant; renai disease; RH sensitization; uterine bleeding; and other specified history 
factors. 
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Figure 7. Estimated Effects of Birth Weight on the Position in the Test Score Distribution 



Notes: Figure 7 plots estimated effects of log birth weight on the CDF of test scores. Specifically, the top panel 
plots coefficients on log birth weight from a series of standard twin fixed effects regressions in which the depen- 
dent variables are indicators marking various points in the CDF of test scores (e.g., greater than —3.5, greater than 
—3, etc). The middle panel plots estimates from analogous regressions that include singletons with birth weights 
that overlap with the twin birth-weight distribution. The bottom panel plots estimates from analogous sibling fixed 
effects regressions conditional on gestation that include singletons with birth weight which overlap with the twin 
birth-weight distribution. 
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Table 3 — EsTiMATED Effects of Birth Weight on Cognitive Development 
BY Chilo and Mother Characteristics 





Twins 




Singletons 






Siblings 






Birth 


Birth 


Birth weight 




Birth 


Birth weight 






weight 


weight 


1 gestation 


Gestation 


weight 


1 gestation 


Gestation 


S ampie 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


Pnnfli A 
















Boys 


0.454*** 


0.296*** 


0.440*** 


0.013*** 


0.230*** 


0.321*** 


0.007*** 




(0.068) 


(0.005) 


(0.011) 


(0.001) 


(0.022) 


(0.044) 


(0.002) 


Girls 


0.449*** 


0.276*** 


0.407*** 


0.013*** 


0.223*** 


0 991*** 


0.008*** 




(0.052) 


(0.005) 


(0.010) 


(0.000) 


(0.021) 


(0.037) 


(0.002) 


Panel B 
















No medicai 


0.449*** 


0.296*** 




0.011*** 


0.249*** 


0.331*** 


0.007*** 


problems 


(0.048) 


(0.005) 


(0.009) 


(0.000) 


(0.015) 


(0.028) 


(0.001) 


Medicai 


0.422*** 


0 249*'^''^ 


0.372*** 


0.015*** 


0.244*** 


0.319*** 


0.011*** 


problems 


(0.066) 


(0.006) 


(0.013) 


(0.001) 


(0.032) 


(0.063) 


(0.003) 


Pciìid C 
















White 


0.464*** 


0 293*** 


0.457*** 


0.011*** 


0.244*** 


0.346*** 


0.006*** 




(0.045) 


(0.005) 


(0.009) 


(0.000) 


(0.015) 


(0.030) 


(0.001) 


Black 


0.392*** 


0.262*** 


0.344*** 


0.015*** 


0.232*** 


0.282*** 


0.011*** 




(0.082) 


(0.006) 


(0.013) 


(0.001) 


(0.017) 


(0.033) 


(0.002) 


Panel D 
















Non-Hispanic 


0.436*** 


0.283*** 


0.426*** 


0.012*** 


0.228*** 


0.304*** 


0.007*** 




(0.044) 


(0.004) 


(0.008) 


(0.000) 


(0.013) 


(0.025) 


(0.001) 


Hispanic 


0.480*** 


0.270*** 


0.384*** 


0.012*** 


0.270*** 


0.357*** 


0.012*** 




(0.079) 


(0.008) 


(0.015) 


(0.001) 


(0.023) 


(0.046) 


(0.002) 


Panel E 
















Non-immigrant 


0.441*** 


0.284*** 


0.422*** 


0.012*** 


0.223*** 


0 292*** 


0.006*** 




(0.044) 


(0.004) 


(0.008) 


(0.000) 


(0.013) 


(0.024) 


(0.001) 


Immigrant 


0.456*** 


0.255*** 


0.379*** 


0.013*** 


0.291*** 


0 411*** 


0.012*** 




(0.077) 


(0.008) 


(0.015) 


(0.001) 


(0.024) 


(0.048) 


(0.002) 


Ponel E 
















Education 


0.358*** 


0.265*** 


0.368*** 


0.012*** 


0.229*** 


0.303*** 


0.008*** 


below 12 years 


(0.094) 


(0.008) 


(0.014) 


(0.001) 


(0.026) 


(0.046) 


(0.002) 


12-15 years 


0.439*** 


0.291*** 


0.436*** 


0.013*** 


0.225*** 


0.306*** 


0.008*** 




(0.050) 


(0.005) 


(0.009) 


(0.000) 


(0.016) 


(0.030) 


(0.001) 


Above 15 years 


0.523*** 


0.256*** 


0.380*** 


0.013*** 


0.238*** 


0.418*** 


0.001 




(0.079) 


(0.010) 


(0.020) 


(0.001) 


(0.031) 


(0.059) 


(0.003) 


Panel G 
















Bottom 


0.388*** 


0.289*** 


0.407*** 


0.015*** 


0.250*** 


0.287*** 


0.011*** 




(0.076) 


(0.007) 


(0.013) 


(0.001) 


(0.020) 


(0.038) 


(0.002) 


Middle 


0.445*** 


0.269*** 


0.407*** 


0.012*** 


0.221*** 


0.339*** 


0.007*** 




(0.072) 


(0.007) 


(0.014) 


(0.001) 


(0.024) 


(0.047) 


(0.002) 


Top 


0 447*=>=H= 


0.264*** 


0.400*** 


0.011*** 


0.239*** 


0.401*** 


0.004* 




(0.078) 


(0.008) 


(0.016) 


(0.001) 


(0.026) 


(0.049) 


(0.002) 


Panel H 
















Unmarried 


0.372*** 


0.269*** 


0.384*** 


0.013*** 


0.235*** 


0.284*** 


0.009*** 




(0.076) 


(0.006) 


(0.011) 


(0.001) 


(0.018) 


(0.034) 


(0.002) 


Married 


0.482*** 


0.292*** 


0.439*** 


0.012*** 


0.259*** 


0.366*** 


0.007*** 




(0.044) 


(0.005) 


(0.010) 


(0.000) 


(0.017) 


(0.032) 


(0.001) 



{Continued) 
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Table 3 — EsTiMATED Effects of Birth Weight on Cognitive Development 
BY Chilo and Mother Characteristics [Continued] 





Twins 




Singletons 






Siblings 






Birth 


Birth 


Birth weight 




Birth 


Birth weight 






weight 


weight 


1 gestation 


Gestation 


weight 


1 gestation 


Gestation 


o3.mplc 






Hi 


























Age below 22 


0.372*** 


0.268*** 


0 373*=i^=f- 


0.011*** 


0.195*** 


0.305*** 


0.005** 




(0.115) 


(0.007) 


(0.014) 


(0.001) 


(0.025) 


(0.046) 


(0.002) 


22-29 


0.444*** 


0 274*** 


0.415*** 


0.011*** 


0.249*** 


0 3 j7*** 


0.009*** 




(0.059) 


(0.006) 


(0.012) 


(0.001) 


(0.022) 


(0.042) 


(0.002) 


30-35 


0.490*** 


Q 294* =i=* 


0.446*** 


0.014*** 


0.228*** 


0.329*** 


0.006** 




(0.069) 


(0.007) 


(0.015) 


(0.001) 


(0.034) 


(0.066) 


(0.003) 


Above 35 


0.410*** 


0.326*** 


0.490*** 


0.018*** 


0.269*** 


0.335*** 


0.016*** 




(0.104) 


(0.012) 


(0.024) 


(0.001) 


(0.054) 


(0.119) 


(0.005) 



Notes: Column 1 presents pooled grades 3-8 twin fixed effects model estimates corresponding to model outlined 
in column 2 in Table 2. Columns 2 to 4 present estimates for singleton population. Column 2 presents the correla- 
tion between pooled grades 3-8 test scores and birth weight for ali singletons. Column 3 presents the coiTelation 
between pooled grades 3-8 test scores and birth weight conditional on gestation for the sample of singletons that 
overlap in birth weight with twin population: i.e., birth weight in range 847 to 3,600 gr. Column 4 presents the cor- 
relation between pooled grades 3-8 test scores and gestation weeks for ali singletons. Columns 5 to 7 present esti- 
mates for sibling population. Twins fixed effects regressions control for child gender and birth order. Ali singleton 
models include the following controls: gender, month and year of birth dummies, maritai and immigrant status, race 
and ethnicity. dummies for maternal education (3 categories), age, and number of births. Sibling models further 
control for birth order within a family. Standard errors in column 1 are clustered at twin-pair level, in columns 2 to 
4 at individuai level while in columns 5 to 7 at mother level. Sample sizes are: 126,636 individuai years observa- 
tions in column 1, 5,752,665 individuai year observations in columns 2 and 4, 4,025,893 individuai year obsei-va- 
tions in column 3, 1,1 10,206 individuai year observations in columns 5 and 7, 648,486 individuai year observations 
in column 6. There are fewer observations in zip code income because we do not observe these for years 1992 and 
1993. There are fewer obsei-vations in racial breakdown because we exclude other races than Black or White from 
this comparison. There are fewer obsei-vations in maternal maiital history breakdown because we miss Information 
for some mothers. 

***Significant at the 1 percent level. 
**Significant at the 5 percent level. 
*Significant at the 10 percent level. 

history, and 0.449 for mothers without medicai history), as are the log birth weight 
coeffìcients for singletons (for instance, 0.372 for mothers with medicai history, 
and 0.437 for mothers without medicai history in the case where we condition on 
gestational length). These results indicate that maternal health at the time of labor 
and delivery does not appear to matter much in terms of the effects of birth weight 
on cognitive development. 

Panels C-I of Table 3 show estimates of the effect of birth weight on pooled 
third- through eighth-grade test scores separately by maternal race (panel C) , mater- 
nal ethnicity (panel D), maternal immigrant status (panel E), maternal education 
(panel F), a proxy for family income: the zip code's median income as of the 2000 
census (panel G), maternal maritai status (panel H), and maternal age at the time 
of the child's birth (panel I). These factors represent a massive range of student 
advantage, with average group test scores among twins as low as —0.475 and as 
high as 0.663 (see Appendix Table Al), reflecting gaps that are consistent with other 
studies of US school children (e.g., Chay, Guryan, and Mazumder 2009). Strikingly, 
the twin fixed effects coeffìcient estimates are remarkably similar across this wide 
range of groups, with point estimates ranging between 0.358 and 0.523. The OLS 
coeffìcient estimates in the singleton population range from 0.249 to 0.326, and the 
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OLS coeffìcient estimates on birth weight conditional on gestation range between 
0.344 and 0.490. Sibling fìxed effects coefficients conditional on gestation range 
from 0.282 to 0.418. Taken together, these results indicate that the effects of birth 
weight on test scores are roughly the same for children from a wide range of differ- 
ent backgrounds. 

F. Complementarity ofNeonatal Health and Parental Inputs 

A dose look at the subgroup analysis can provide some evidence regarding the 
degree to which neonatal health and parental inputs are complements or substitutes. 
One might expect parents with more resources to be better able to remediate the 
effects of poor neonatal health. However, whether neonatal health and parental 
inputs are complementary is determined by whether parents with more resources are 
relatively more effective at building human capital for children of good versus poor 
neonatal health, which could happen either because parents with more resources 
invest more or because the investments they make bave higher returns.^' Learning 
whether parental resources and neonatal health are complementary provides a win- 
dow into mechanisms by which parents and early health interact in the human capi- 
tal development process. 

To explore this question systematically, we pursue an approach similar to that 
employed by Hoynes, Miller, and Simon (forthcoming) to study the relationship 
between the earned income tax credit (EITC) and rates of low birth weight for dif- 
ferent groups broken down by their rate of EITC usage. In our case, we use mater- 
nal race, maternal ethnicity, maternal immigrant status, maternal maritai status, 
matemal age, maternal education, and neighborhood income to predict student test 
scores in order to construct an index of the family socioeconomic status (SES) , and 
then divide the students into ten mutually exclusive groups; these groups range in 
mean predicted test scores from —0.701 to 0.809 in the twins population — a range 
greater th an a full individuai level standard deviation of the test score distribution.^^ 



Figure 8 plots each group's estimated coeffìcient on log birth weight against the 
group's mean score. We explore the relationship between SES and the effect of birth 
weight on children 's cognitive development in three different models: the twin fìxed 
effects model, the sibling fìxed effects model conditional on gestation and restricted 
to the population of singletons whose birth weights fall within the observed range of 
twin birth weights, and the comparable OLS model for singletons. 

The figure demonstrates two important features of the heterogeneity of birth 
weight effects across a wide range of groups stratified by predicted test scores. First, 
the estimated effects of birth weight are ali within the same general range between 
0.30 and 0.67 in the twin fixed effects model, between 0.29 and 0.48 in the single- 
tons OLS model, and between 0.24 and 0.45 in the sibling fixed effects model, and 
the estimated effects are both statistically and economically significant for every 



See Guryan, Hurst, and Keamey (2008) for evidence that more educated parents spend more time in parenting 
activities with their children, and for a discussion of how that could theoretically result from either a desire to invest 
more or from higher retums. 

^^The groups range in mean test scores from —0.618 to 0.755 in the case of singletons and from —0.696 to 0.817 
in the case of sibling fìxed effects. 
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Figure 8. Average Test Scores among Groups and Estimated Birth Weight Effects 



Notes: Figure 8 plots the estimates for the ten predicted groups based on the regression of 
test scores on materna! race, ethnicity, immigrant origin, maritai status, education, age cat- 
egories, and income indicators. These groups ai'e not overlapping. In this graph income from 
1992 and 1993 is imputed based on observables. Groups are calculated only for individuals 
with ali Information available and for ali singletons and siblings with birth weight in a range 
of 847 to 3,600 gr. 



demographic and socioeconomic group analyzed.^^^ These magnitudes would imply 
that the effects on cognitive development could account for one-half to ali of the 
long-term relationship between birth weight and eamings estimated by BDS. 

The second pattern the figure illustrates is an upward-sloping relationship between 
estimated treatment effects and the subgroup's mean test score. This positive rela- 
tionship indicates that the effects of birth weight are larger for relatively advantaged 
groups of children than they are for relatively disadvantaged groups of children. The 
slopes of the lines plotted in Figure 8 are 0.132, with a standard error of 0.086, in 
the case of the twin fixed effects model, 0. 136, with a standard error of 0.060, in the 
case of the sibling fixed effects model, and 0.083, with a standard error of 0.019, in 
the case of the singletons OLS model.^"^ The three lines are similar in terms of both 
slope and intercept, and indeed, the twin fixed effects and sibling fixed effects lines 
are virtually parallel. It is highly unlikely that these results are driven by differential 
selection into the sample across groups, at least by birth weight. As an illustration, 
the difference in gaps in average birth weight between twin-pairs with test scores 
and those without test scores ranges from —47 to 82 gr, and foUow no apparent 

We bave also estimated specifications in which we interact log birth weight separately with the socioeconomic 
variables referenced in Table 3. We then evaluated the marginai effect of log birth weight separately for every child 
in the population. The marginai effects in the case of the twin fixed effects specification ranged from 0.17 to 0.62. 
Online Appendix Figure A4 plots the estimated marginai effects of log birth weight for the full distribution of pos- 
sibilities in this specification. 

^''We estimate the standard errors of the slopes of these lines by bootstrapping. We randomly drew twin pairs 
(sibling pairs or singletons) with replacement to generate a sample of the same size as our analysis sample. We then 
used this sample to predict test scores and to separate the bootstrapped sample into ten deciles based on predicted 
test scores. Next, we estimated twin fixed effects (sibling fixed effects or singleton) models for each of the ten 
deciles. For both twins, siblings, and singletons, we ran 1,000 replications of these 10-observation regressions and 
calculated the standard deviation from these slopes for our bootstrapped standard errors. 
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pattern: the typical gap is just 3 grams for the bottom half of the SES distribution 
and 7 grams for the top half. Therefore, while by no means definitive, these patterns 
indicate that poor neonatal health may disproportionately affect children growing up 
in high socioeconomic status families, and are suggestive that neonatal health and 
parental resources are to some degree complementary.^^ 

V. Effect Variation across the Birth Weight Distribution and by Discordance Levels 

Thus far, we bave presented estimates of our baseline model, which specifìes that 
the relationship between average test scores and birth weight is linear in the log of 
birth weight. Understanding how the marginai effect of birth weight varies across 
the birth weight distribution and with birth weight discordance may be helpful in 
narrowing down potential mechanisms for the relationship. Public health offìcials 
and medicai practitioners frequently direct attention on the thresholds of 1 ,500 and 
2,500 gr, the conventional delimiters of very low birth weight and low birth weight, 
respectively. Stronger marginai effects of proportional increases in birth weight for 
very low and low birth weight babies might suggest different physiological mecha- 
nisms than if the effects were only present in comparisons between moderate and 
high birth-weight infants. 

We bave already presented nonparametric evidence (Figure 5) that the relation- 
ship between birth weight and student test scores appears to be concave, supporting 
the log birth weight specifìcation that is common in the related literature. That said, 
there could stili be some important nonlinearities in the relationship. In this subsec- 
tion we relax the assumptions underlying our main specifìcation and explore how 
the marginai effect of poor neonatal health varies across the distribution of birth 
weight and with birth weight discordance. First we estimate models that allow the 
marginai effect of log birth weigh t to vary across different bins of the birth weight 



distribution. As seen in Figure 9, [ which presents separate twin fìxed effects coeffì- 
cients for 20 equally sized bins, based on the lighter-born twin's birth weight,^^ we 
observe no systematic relationship between the marginai effect of log birth weight 
on test scores and the level of birth weight. The estimated effects are largely stable, 



"'^Children in higher scoring subgroups (who tend to have high income, highly educated families with older 
mothers) are more likely to have been born with the assistance of in vitro fertilization (IVF) or other assisted 
reproduction technologies (ART) . It is therefore conceivable that the positive relationship plotted in Figure 8 — at 
least for the twins population — is due at least in part to differential patterns of IVF/ART. This association could be 
especially important in a population of twins, given that Bitler (2008) demonstrates that requiring health Insurance 
plans to cover use of IVF/ ART substantially increases the likelihood that a mother will have twins, and these new 
twins likely conceived with the assistance of IVF/ART have lower quality birth outcomes. While we cannot mea- 
sure IVF/ART use in our data, we conduct two checks to see whether or not differential IVF/ART prevalence is a 
plausible explanation for our findings. First, we conduct the identical analysis for twins born to mothers aged 30 and 
above, versus those under 30. Bitler uses this age breakdown to proxy for IVF/ART likelihood. Next, we conduct 
the identical analysis for twins who were the first children boni to the mother to those who were not the first children 
born to the mother, given that IVF/ ART is more likely among families with previous fertility challenges. We do not 
find evidence that these slopes differ appreciably across these groups of mothers. Taken together, these results sug- 
gest that differential probabilities that children from high-scoring subgroups were conceived via IVF/ART are not 
responsible for the positive-sloped relationship between the scoring level of the subgroup and the subgroup-specific 
estimated effect of birth weight on test scores. 

We have also estimated models that define the bins based on the heavier born twin's birth weight. These results 
are very similar and are presented in online Appendix Figure A8. 
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Figure 9. Estimated Effects of Birth Weight, by Weight oe Smaller Twin 



Notes: Figure 9 plots coefficient estimates from a twin fìxed effects regression wliere the 
dependent vaiiable ( y-axis) is the mean test score and the independent variables {,v-axis) are 
the products of log birth weight with indicators for 20 bins reflecting lighter twin percentiled 
birth weight. The regression additionally controls for infant gender and birth order within-twin 
pair. Heteroskedasticity robust standard errors are used to calculate the 95 percent confidence 
interval. Numbers on the .v-axis coiTespond to the mean smaller twin birth weight in each of 
the 20 bins. 



aside from variation that appears to be due to sampling variation, across the distribu- 
tion of birth weight.^' 

We next explore whether the relati onship betw een birth weight and test scores 
varies by birth weight discordance in Figure 10. We divide twins into 20 bins by 
birth weight discordance, excluding the twin pairs that are very dose in weight (less 
than 150 gram difference).^* As can be seen in the figure, the estimated relationship 
between log birth weight and test scores is qualitatively similar across a wide range 
of discordance. 

Given the salience in the medicai and public health literature of specifìc birth 
weight thresholds (1,500 and 2,500 gr), we next explore whether the estimated 
effects of log birth weight in twin fìxed effects models differ systematically above 
and below 2,500 gr. Rows 2 through 5 of online Appendix Table A2 break down 
our estimates into different groups based on the birth weights of the smaller twin. 
As can be seen, the estimated effect of a marginai increase in birth weight is quite 
similar for pairs with at least one low birth weight (less than 2,500 gr) twin and 
those with only normal birth weight (greater than or equal to 2,500 gr) twins; the 
estimate for the former is 0.428, and for the latter it is 0.526, and the two pooled 
coeffìcients are not statistically distinguishable from one another. Likewise, the esti- 
mated effects reported in rows 4 and 5 of the table for twin pairs with at least one 



"'^ An F-test fails to reject the nuli hypothesis that the coefficient on log birth weight is the same across ali 20 bins 
(p-value: 0.943). 

At very small discordances of less than 3 or 4 percent, the estimates are too noisy to obtain a meaningful result. 
We exclude the very small discordances, therefore, so that the results for more meaningful discordances are more 
straightforward to present and observe. 
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Figure 10. Estimated Effects of Birth Weight. by Birth Weight Discordance 



Notes: Figure 10 plots coefficient estimates from a twin fìxed effects regression where the 
dependent variable { v-axis) is the mean test score and the independent variables (.r-axis)are 
the products of log birth weight with indicators for 20 bins reflecting birth weight discor- 
dance between twins. The regression additionally controis for intani gender and birth order 
within-twin pair. Heteroskedasticity robust standard errors are used to calculate the 95 per- 
cent confidence interval. Numbers on the .r-axis con'espond to the mean twin pair percentage 
discordance. 



very low birth weight (less than 1 ,500 gr) and those where the smallest twin is low 
birth weight (1,500-2,499 gr) twins do not vary substantially across these groups. 
The estimated effects for very low birth weight, low birth weight, and normal weight 
are, respectively, 0.432, 0.431, and 0.526. Online Appendix Table A2 also presents 
other specifìcations, such as birth weight measured linearly, and birth weight inter- 
acted with the population demeaned mean birth weight in the twin pair, and ali sets 
of results paint the same fundamental picture.^^ 



VI. School QuaUty and the Effect of Birth Weight on Test Scores 



The results presented thus far bave demonstrated that there is a robust relationship 
between birth weight and grade 3-8 test scores, and that this relationship is remark- 
ably stable as children age through preadolescence, across different demographic 
groups, and across different socioeconomic groups. The stability of this relationship 
is ali the more notable because the marginai effect of birth weight does not vary 
much across groups that bave very different average test scores. Children grow- 
ing up in circumstances that lead to very different achievement levels nonetheless 
appear to he impacted by early health conditions in similar ways. This fìnding raises 
the question whether investments in children remediate the effect of early defìcits 
in health. 

Schools are an obvious place to look for investments in human capital. In this sec- 
tion we ask whether the effect of birth weight on test scores is different for students 



Additional formai tests supporting the linear in log birth weight specification are described in Figlio et al. 
(2013). 
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who attend high quaUty versus low quality schools. Students who attend higher 
quaUty schools have higher test scores. But does a lower birth-weight twin perform 
better relative to bis counterpart if the twin pair attends a high quality school instead 
of a low quality school? In other words, does school quality remediate the effect of 
early health defìcits? 

To answer this question, we measure school quality in six different ways. AH 
are based on test scores; however, the available evidence (e.g., Chetty et al. 2011, 
Chetty, Friedman, and Rockoff 2013) suggests that measures of school or teacher 
quality based on test scores correlate strongly with later life outcomes. First, we take 
advantage of the fact that since 1999 the state of Florida has given each of its pub- 
lic schools a letter grade ranging from A (best) to F (worst) . Initially, this grading 
system was based mainly on average proficiency rates on the FCAT. Beginning in 
2002, grades were based on a combination of average FCAT proficiency rates and 
average student level FCAT test score gains from year to year. We stratify schools 
based on average proficiency levels and average student gains from year to year. 
In addition, because jurisdictions have made very different determinations about 
what it means to be a "good" school, we have coded, to the closest degree possible in 
our data, three other highly publicized state / city school grading systems that weight 
measures of school quality in substantially different ways: the systems in Indiana, 
Louisiana, and New York City. 

The results of the school quality analyses are presented in Tables 4 | an(jT|similarly 
to Table 3 we present mean group test scores and birth weight in the print Appendix 
Table Al). Panel A of Table 4 shows estimates separately for twins who attended 
schools that received an A, B, and C or below. For reasons due either to school 
quality or to selection, test scores are much higher in A-rated schools than in lower 
rated schools, and we also observe that twins and singletons who attend higher rated 
schools tend to have heavier birth weights than those attending lower-rated schools. 
But while there are relationships between school grade, birth weights, and test scores, 
there is no monotonie relationship in the association between birth weight and test 
scores: the estimated effect of birth weight is largest among twins who attend schools 
receiving a grade of B (0.499). The smallest estimated effect is for twins attending A 
schools (0.407), and the estimate in the middle is for twins attending C/D/F schools 
(0.458). These coeffìcients are not statistically distinguishable from one another. The 
point estimates are even closer together for singletons, where the estimated coef- 
fìcient on birth weight varies between 0.273 and 0.284 (0.224 to 0.237 for sibling 
pairs) and the estimated coeffìcient on birth weight conditional on gestational length 
ranges from 0.377 to 0.413 (0.276 to 0.333 for sibUngs). 

Florida's school grades are based in large measure on the school's average FCAT 
scores and the school's average student-level FCAT score improvements. Panels B 
and C of Table 4 explicitly subdivide schools based on these dimensions. We fìnd 
that regardless of whether schools are stratifìed by average levels of FCAT scores 
or average score improvements, the estimated effects of birth weight are present 
and approximately the same. For instance, the estimated marginai effect of log birth 



""if we code the school grades on the scale from 0 (F) to 4 (A), we observe that state-awarded grades correlate 
with average school achievement at 0.71 and with growth in achievement at 0.23, while the average achievement 
correlates with achievement growth at 0.03. 
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Table 4 — Results by School Quality Measures 



Twins 




Singletons 






Siblings 




Birth 


Birth 


Birth weight 




Birth 


Birth weight 




weight 


weight 


1 gestation 


Gestation 


weight 


1 gestation 


Gestation 


Sample ( 1 ) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


Panel A. Awuvded gvadc 














A 0.407*** 


0.273*** 


0.412*** 


0.012*** 


0.233*** 


0.333*** 


0.005*** 


(0.042) 


(0.004) 


(0.009) 


(0.000) 


(0.014) 


(0.027) 


(0.001) 


B 0.499*** 


0.284*** 


0 413*** 


0.012*** 


0 224*** 


0.305*** 


0.007*** 


(0.063) 


(0.006) 


(0.011) 


(0.001) 


(0.022) 


(0.043) 


(0.002) 


C and D and F 0.458*** 


0.275*** 


0.377*** 


0.014*** 


0.237*** 


0.276*** 


0.010*** 


(0.076) 


(0.006) 


(0.012) 


(0.001) 


(0.021) 


(0.040) 


(0.002) 


rdìisl D. Avevate pTOJlClCHCy 














Rplow mpHinn 0 il'^'7*^* 


0.281*** 


Q 395*** 


0.014*** 


0.230*** 


0 293*** 


0.010*** 


(0.061) 


(0.005) 


(0.010) 


(0.000) 


(0.016) 


(0.030) 


(0.001) 


Above median 0.426*** 


0.267*** 


0.404*** 


0.011*** 


0.240*** 


0.348*** 


0.005*** 


(0.043) 


(0.004) 


(0.009) 


(0.000) 


(0.015) 


(0.029) 


(0.001) 


Panel C. Growth in proficìency 














Below median 0.453*** 


0.286*** 


0.428*** 


0.012*** 


0.245*** 


0.324*** 


0.008*** 


(0.044) 


(0.004) 


(0.008) 


(0.000) 


(0.014) 


(0.026) 


(0.001) 


Above median 0.427*** 


0.284*** 


0.413*** 


0.013*** 


0.229*** 


0.281*** 


0.008*** 


(0.045) 


(0.004) 


(0.008) 


(0.000) 


(0.014) 


(0.026) 


(0.001) 



Notes: Column 1 presents pooled grades 3-8 twin fixed effects model estimates corresponding to model outlined 
in column 2 in Table 2. Columns 2 to 4 present estimates l'or singleton population. Column 2 presents the correla- 
tion between pooled grades 3-8 test scores and birth weight l'or ali singletons. Column 3 presents the correlation 
between pooled grades 3-8 test scores and birth weight conditional on gestation for the sample of singletons that 
overlap in birth weight with twin population: i.e., birth weight in range 847-3,600 gr. Column 4 presents the cor- 
relation between pooled grades 3-8 test scores and gestation weeks for ali singletons. Columns 5 to 7 present esti- 
mates for sibling population. Twins fixed effects regressions control for child gender and birth order. Ali singleton 
models include the following controls: gender, month and year of birth dummies, maritai and immigrant status, 
race and ethnicity, dummies for maternal education (three categories), age, and number of births. Sibling models 
further control for birth order within a family. Standard errors in column 1 are clustered at twin-pair level, in col- 
umns 2 to 4 at individuai level, while in columns 5 to 7 at mother level. In the case of awarded grades since not ali 
schools are awarded grades every year our sample consists of 123,886 observations used in models in column 1, 
5,650,536 observations used in models in columns 2 and 4, 3,952,642 observations used in models in column 3, 
1,084,620 observations used in models in columns 5 and 7, and 632,125 observations used in column 6. In the case 
of average proficiency we use 125,936 observations in models in column 1, 5,731,434 observations in models in col- 
umns 2 and 4, 4,01 1,368 observations in models in column 3, 1,106,452 observations used in models in columns 5 
and 7, and 646,284 observations used in column 6. In the case of growth in proficiency we use 125,566 observations 
in models in column 1, 5,716,150 observations in models in columns 2 and 4, 4,000,486 observations in models in 
column 3, 1,102,938 observations used in models in columns 5 and 7, and 644,010 observations used in column 6. 
The discrepancy between the samples in Table 3 and Table 4 is due to the fact that we do not bave data on school 
quality for the universe of schools in every year in Florida (in particular average proficiency and growth cannot be 
calculated for a newly established school) . 
***Significant at the 1 percent level. 
**Significant at the 5 percent level. 
*Significant at the 10 percent level. 



weight for twins attending schools with above median FCAT scores is 0.426, versus 
0.437 for twins attending schools with below median FCAT scores, and the esti- 
mated marginai effect twins attending a school that had above median year-to-year 
gains in FCAT scores is 0.427, versus 0.453 for schools with below median gains in 
FCAT scores. 

Applying other jurisdictions' school grading formulas to Florida's data, as 
reported in Table 5, does not change the fundamental conclusion regarding school 
quality. We break the Florida school rankings based on each of the three state 
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alternative grading systems into thirds and fìnd several consistent pattems. First, 
the estimated relationship between log birth weight and student test scores is strong 
and present in ali cases. Second, there is rarely a monotonie relationship observed 
between the measure of school quality and the coeffìcient on log birth weight, 
whether it is derived from a twin fìxed effect model, a sibling fìxed effect model, 
or from a singletons model controUing for gestational length or from a singletons 
model without controlling for gestation. Third, in the rare cases in which there exists 
a monotonie relationship, in one case (singletons in New York City) the pattern runs 
counter to that of the other two (sibling fìxed effects in Indiana and Louisiana) , and 
in ali cases the coeffìcient estimates are very similar."*^ 

Given that we observe larger estimated effects of birth weight for higher SES 
families than for lower SES families, and since higher SES families tend to select 
into higher rated schools, it is possible that our fìnding of no relationship between 
measured school quality and the estimated effect of birth weight is biased due to 
these differentials. To investigate this possibility, we repeat the school grades analy- 
sis but further stratify the estimated effects of birth weight by predicted socioeco- 
nomic status using the same approach that we foUowed to generate Figure 8. These 
results are presented in online Appendix Table A3. We continue to observe strong, 
positive relationships between log birth weight and test scores for ali school grade 
levels and ali predicted socioeconomic groups. In addition, there continues to be no 
consistent pattern in these estimated relationships across school grades. For the twin 
fìxed effect model, the smallest estimated effects are seen in A schools in two of the 
three socioeconomic groups (the lower and middle SES groups), but the patterns are 
different for singletons. It appears, therefore, that the differential selection of higher 
SES families into higher-rated schools is not responsible in a substantial way for 
our fìnding that school quality appears to not affect substantively the relationship 
between birth weight and student outcomes."^^ 

In summary, the evidence appears to indicate that the effect of birth weight on 
test scores does not vary substantially with measures of the quality of schools that 
a child attends. One view of this result could be that the effects of in utero health 
conditions create a ceiling to learning that cannot be remediated after the fact, at 
least by the time that children are of schooling age. Students spend a great deal of 
time in schools, and schooling is the primary formai way that human capital invest- 
ment takes place during childhood. The amount (Card 1999) and quality (Card and 
Krueger 1992a; Card and Krueger 1992b; Krueger and Whitmore 2001; Chetty et 
al. 2011; Chetty, Friedman, and Rockoff 2013) of schooling bave been shown to 
bave signifìcant positive impacts on earnings and other outcomes. If attending a 
better school improves ali students' outcomes in parallel but does not completely 
remediate the effects of early health defìcits on cognitive development, it may be 
that schools currently lack the resources or Information necessary to fuUy remediate 
these defìcits. 

"'The relationship between gestational length and test scores is monotonie in measured school quality, but the 
results across measured school quality are always similarly-sized, consistent with our overall findings. 

"^We bave also estimated models in which we control for log birth weight interacted with observable maternal 
and socioeconomic characteristics. Our results regarding no apparent relationship between school quality measures 
and the estimated effect of log birth weight are fundamentally unchanged when we further condition on these 
interaction terms. 
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Table 5 — Results by School Quality Measures: 
RuNNiNG Florida Data through Other State School Gradino Systems 







Twins 




Singletons 






Sibhngs 








Birth 


Birth 


Birth weight 




Birth 


Birth weight 






Quality 


weight 


weight 


1 gestation 


Gestation 


weight 


1 gestation 


Gestation 


State 


group 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


New York City 


Top 


0.389*** 


0.270*** 


0.405*** 


0.011*** 


0.195*** 


0.265*** 


0.004*** 






(0.049) 


(0.005) 


(0.010) 


(0.000) 


(0.018) 


(0.035) 


(0.002) 




Middle 




0.275*** 


0.407*** 


0.012*** 


0.233*** 


0.318*** 


0.008*** 






(0.051) 


(0.005) 


(0.009) 


(0.000) 


(0.018) 


(0.033) 


(0.002) 




Bottom 


0.484*** 


0 294*** 


0.418*** 


0.014*** 


0.251*** 


0.293*** 


0.011*** 






(0.062) 




(0.011) 


(0.001) 


(0.020) 


(0.038) 




Louisiana 


Top 




0.263*** 


0.403*** 


0.011*** 


0.232*** 


0.353*** 


0.005*** 






(0.048) 


(0.005) 


(0.010) 


(0.000) 


(0.018) 


(0.034) 


(0.002) 




Middle 


0.480*** 


0.283*** 


0.409*** 


0.013*** 


0.241*** 


0.319*** 


0.008*** 






(0.054) 


(0.005) 


(0.010) 


(0.000) 


(0.018) 


(0.035) 


(0.002) 




Bottom 


0.450*** 


0.267*** 


0.360*** 


0.015*** 


0.218*** 


0.250*** 


0.010*** 






(0.104) 


(0.008) 


(0.015) 


(0.001) 


(0.028) 


(0.052) 


(0.002) 


Indiana 


Top 


0.401*** 


0.260*** 


0.395*** 


0.011*** 


0.217*** 


0.330*** 


0.005*** 






(0.047) 


(0.005) 


(0.010) 


(0.000) 


(0.017) 


(0.034) 


(0.001) 




Middle 


0 522*** 


0.286*** 


0.415*** 


0.013*** 


0.236*** 


0.290*** 


0.008*** 






(0.054) 


(0.005) 


(0.010) 


(0.000) 


(0.019) 


(0.034) 


(0.002) 




Bottom 


0.434*** 


0.276*** 


0.384*** 


0.015*** 


0.243*** 


0.274*** 


0.010*** 






(0.097) 


(0.007) 


(0.014) 


(0.001) 


(0.026) 


(0.049) 


(0.002) 



Notes: Column 1 presents pooled grades 3-8 twin fixed effects model estimates coiresponding to model outlined 
in column 2 in Table 2. Columns 2 to 4 present estimates for singleton population. Column 2 presents the correla- 
tion between pooled grades 3-8 test scores and birth weight for ali singletons. Column 3 presents the coiTelation 
between pooled grades 3-8 test scores and birth weight conditional on gestation for the sample of singletons that 
overlap in birth weight with twin population: i.e., birth weight in range 847-3,600 gr. Column 4 presents the cor- 
relation between pooled grades 3-8 test scores and gestation weeks for ali singletons. Columns 5 to 7 present esti- 
mates for sibling population. Twin fixed effects regressions control for child gender and birth order Ali singleton 
models include the following controls: gender, month and year of birth dummies, maritai and immigrant status, race 
and ethnicity, dummies for matemal education (three categories), age, and number of births. Sibling models further 
control for birth order within a family. Standard errors in column 1 are clustered at twin-pair level, in columns 2 to 
4 at individuai level, while in columns 5 to 7 at mother level. In the case of awarded grades since not ali schools are 
awarded grades every year and not every system was functioning through the same time period our samples differ. 
New York system simulation consists of 107,794 observations used in models in column 1, 4,972,962 observations 
used in models in column 2 and 4, 3,471,424 observations used in models in column 3, 850,751 observations used 
in models in columns 5 and 7, and 493,281 observations used in models in column 6. Louisiana system simulation 
consists of 108,926 observations used in models in column 1, 5,027,615 observations used in models in columns 2 
and 4, 3,508.071 observations used in models in column 3, 850,751 observations used in models in columns 5 and 
7 and 493,281 observations used in models in column 6. Indiana system simulation consists of 107,798 observa- 
tions used in models in column 1, 4,973,1 14 observations used in models in column 2 and 4, 3,471,516 observations 
used in models in column 3, 850,751 observations used in models in columns 5 and 7, and 493,281 observations 
used in models in column 6. 

***Significant at the 1 percent level. 
**Significant at the 5 percent level. 
*Significant at the 10 percent level. 

An alternative view of the results is that school quality does not affect remediation 
differentially, but leaves open the possibility that remediation could happen. This 
view is supported by a few observations. The difference in birth weights between 
twins or siblings is probably far more noticeable to parents than to classroom teach- 
ers. To parents a 15 percent difference in twins' or siblings' birth weight would 
be noticeable, but to a teacher 9 to 14 years later, children's initial birth weights 
would be insignifìcant compared to the cognitive achievement she observes in the 
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classroom. Even differences in cognitive achievement resulting from large discor- 
dances in birth weight among twins or siblings probably appear to the teacher to be 
the result of temperamental differences. Recali that the difference in achievement 
between the average high and low birth-weight twin is far less than the difference 
in achievement between children born to coUege-educated and high-school dropout 
mothers. Given this discrepancy, it is likely that teachers treat twins or siblings — or, 
for that matter, similar children under a different dimension — similarly. The lack of 
relative improvement of children with poor neonatal health in better-rated schools 
may not indicate that it is impossible to remediate. Rather, it may indicate that it is 
not done, or at least not done systematically. 

VII. Conclusion 

Using a unique population level data source from Florida, we present the first look 
at the effects of poor neonatal health on child cognitive development in a highly 
developed context, provide the first comprehensive study of the differential effects 
on a Wide range of different demographic and socioeconomic groups, and offer 
the first exploration of the degree to which school quality might influence these 
effects. Our results are remarkably consistenti children with higher birth weight 
enter school with a cognitive advantage that appears to remain stable through the 
dementar y and middle school years. The birth weight related pattems in test score 
performance observed in twins are also seen in the overall population of singletons. 
The estimated effects of low birth weight are present for children of highly educated 
and poorly educated parents alike, for children of both young and old mothers, and 
for children of ali races and ethnicities, parental immigration status, parental mari- 
tal status, and other background characteristics. The estimated effects of neonatal 
health are of roughly the same magnitude throughout the tested grades as they are at 
the beginning of kindergarten (Figlio et al. 2013), and even as they are in very early 
childhood (Hart 2008).'^"' The estimated effects are just as pronounced for students 
attending highly performing public schools (measured in a variety of ways) as they 
are for students attending poorly performing public schools. These results strongly 
point to the notion that the effects of poor neonatal health on adult outcomes are 
largely determined early: in early childhood and the first years of elementary school. 

This pattern persists despite parental attempts to provide different experiences 
to their different children in early childhood. Bharadwaj, Eberhard, and Neilson 
(2013) and Hsin (2012), for example, find evidence that parents tend to invest more 
in lower birth-weight children than they do in higher birth-weight children, indicat- 
ing a desire for remediation. While our administrative data do not offer the types of 
survey data used in those two papers, we see evidence of parents actively and simul- 
taneously making different choices for their twins, suggesting that parents recognize 
developmental differences in their children and seek to remediate these differences 
in early childhood. It is reasonably common in Florida for parents to send one twin 
to preschool but not the other (true in 7.6 percent of twin pairs and 8.9 percent of 
twin pairs in which the birth weight discordance is greater than 20 percent). In 

Hart's (2008) study of a much smaller set of twins in the ECLS-B fìnds estimated effects of birth weight on the 
Bayley Scales of Infant Development that are close in effect size to those presented in our paper. 
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9.2 percent of twin pairs (10.5 percent of twin pairs with discordance greater than 
20 percent) parents choose different preschool arrangements for their twins by either 
sending one twin to preschool but not the other, or sending both twins to preschool 
but only one to privately fìnanced preschool. And in just under one percent of cases 
(1.2 percent of twin pairs with discordance greater than 20 percent) parents redshirt 
one twin but not the other by starting twins in school at different ages."*"* 

Children with poor neonatal health who come from highly educated families per- 
form much better than those with good neonatal health who come from poorly edu- 
cated families, indicating that nurture can at least partially overcome nature. Indeed, 
this fìnding is very much in keeping with the literature on the positive relationship 
between household income and health status in childhood and adulthood (see, e.g., 
Case, Lubotsky, and Paxson 2002). Stili, the fact that these initial biological fac- 
tors are not fuUy overcome for even the most affluent and educated families — and, 
indeed, that the estimated effects of log birth weight are actually somewhat higher 
for these families — is consistent with the notion that parental inputs and neonatal 
health are complements rather than substitutes. While what exactly parents do to 
remediate initial biological disadvantage successfully and what schools and parents 
could do potentially in early childhood and the early elementary grades and beyond 
to continue to remediate are open questions, this study provides numerous indica- 
tions that poor neonatal health establishes a stable trajectory for children's cognitive 
development. 

These fìndings bave potential implications for both health and education policy 
and practice. While it is premature to suggest specifìc policy responses based on 
this work, these fìndings indicate some potentially fruitful places to look for addi- 
tional evidence. On the health side, for example, it will be valuable to learn whether 
improvements in earnings by families with pregnant women, improved maternal 
nutrition, or reduced maternal stress (ali factors associated with higher birth weight) 
also translate to better cognitive outcomes in childhood. On the education side, it 
will be important to leam whether the relationship between birth weight and cogni- 
tive outcomes is attenuated in cases in which health and education providers bave 
more interaction, such as in the case of children who participate in early intervention 
pre -kindergarten programs. Understanding these types of relationships will help us 
to modify the mechanisms through which neonatal health affects cognitive outcomes 
in childhood and adulthood, and guide health and education policy and practice. 



In cases of differential redshirting, parents are slightly more likely to redshirt the hghter twin than they are to 
redsliirt the heavier twin. We discuss differential redshirting in greater detail in Fighe et al. (2013). 
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Appendix 

Table Al — Mean Test Scores and Birth Weight for Groups Used in Tables 3, 4, and 5 



Mean test score Mean test score 

[mean birth weight] [mean birth weight] 





Twins 


Singletons 




Twins 


Singletons 


Sample 


(1) 


(2) 


Sample 


(3) 


(4) 


Table 3 






30-35 


0.280 


0.305 


Boys 


0.049 


0.051 




[2,467] 


[3,390] 




[2,473] 


[3,397] 


Above 35 


0.342 


0.306 


Girls 


0.101 


0.119 




[2,480] 


[3,353] 




[2,369] 


[3,274] 


Table 4 






No medicai problems 


0.076 


0.098 


A 


0.278 


0.276 




[2,457] 


[3,359] 




[2,437] 


[3,365] 


Medicai problems 


0.074 


0.041 


B 


-0.093 


-0.039 




[2,356] 


[3,259] 




[2,410] 


[3,323] 


White 


0.258 


0.228 


C and D and F 


-0.399 


-0.310 




[2,457] 


[3,393] 




[2,375] 


[3.266] 


Black 


-0.465 


-0.362 


Below median 


-0.339 


-0.248 




[2,318] 


[3.180] 




[2,382] 


[3.281] 


Non-Hispanic 


0.099 


0.110 


Above median 


0.298 


0.298 




[2,413] 


[3,333] 




[2,442] 


[3.371] 


Hispanic 


-0.034 


0.001 


Below median 


0.046 


0.058 




[2,454] 


[3,346] 




[2,421] 


[3.337] 


Non-immigrant 


0.074 


0.079 


Above median 


0.100 


0.106 




[2,414] 


[3,334] 




[2,422] 


[3.335] 


Immigrant 


0.082 


0.106 


Table 5 








[2,452] 


[3,344] 


New York City 






Education below 12 years 


-0.475 


-0.339 


Top 


0.305 


0.291 




[2,339] 


[3,252] 




[2,440] 


[3.363] 


12-15 years 


0.005 


0.095 


Middle 


0.053 


0.073 




[2,430] 


[3,348] 




[2,427] 


[3.336] 


Above 1 5 years 


0.663 


0.677 


Bottom 


-0.180 


-0.120 




[2,451] 


[3,417] 




[2,395] 


[3.308] 


Bottom 


-0.216 


-0.138 


Louisiana 








[2,393] 


[3.285] 


Top 


0.375 


0.371 


Middle 


0.121 


0.085 




[2,448] 


[3.381] 




[2,410] 


[3.337] 


Middle 


-0.090 


-0.024 


Top 


0.437 


0.381 




[2,414] 


[3.325] 




[2,434] 


[3.382] 


Bottom 


-0.489 


-0.377 


Non-married 


-0.359 


-0.234 




[2,365] 


[3.250] 




[2,336] 


[3.237] 


Indiana 






Married 


0.273 


0.277 


Top 


0.359 


0.349 




[2,459] 


[3.396] 




[2,448] 


[3.376] 


Age below 22 


-0.394 


-0.207 


Middle 


-0.068 


-0.006 




[2,268] 


[3.237] 




[2,412] 


[3.328] 


22-29 


-0.005 


0.076 


Bottom 


-0.450 


-0.352 




[2,419] 


[3,357] 




[2,369] 


[3,259] 



Notes: Descriptive statistics for each group reported in Tables 3. 4. and 5. These present mean combined mathemat- 
ics and reading test scores as well as mean birth weight for twins and singletons, respectively. 
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